71 research outputs found

    Data quality problems in TPC-DI based data integration processes

    Get PDF
    Many data driven organisations need to integrate data from multiple, distributed and heterogeneous resources for advanced data analysis. A data integration system is an essential component to collect data into a data warehouse or other data analytics systems. There are various alternatives of data integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating data integration systems. When using this benchmark, we find some typical data quality problems in the TPC-DI data source such as multi-meaning attributes and inconsistent data schemas, which could delay or even fail the data integration process. This paper explains processes of this benchmark and summarises typical data quality problems identified in the TPC-DI data source. Furthermore, in order to prevent data quality problems and proactively manage data quality, we propose a set of practical guidelines for researchers and practitioners to conduct data quality management when using the TPC-DI benchmark

    Classification Methodology for Architectures in Information Systems: A Statistical Converging Technique

    Get PDF
    Architectures are critical to the Information System (IS) domain because they represent funda- mental structures and interactions of systems. Since analysing architecture similarities is chal- lenging and time-consuming even in one domain, IS architecture classifications are paramount to understanding architectural complexity. However, classification approaches used in existing research commonly rely on manual interventions, and thus architectural classification reliability is hampered. We propose a novel methodology based on component modelling and applica- tion of a statistical converging technique, which ensures reliable IS architectural classification and minimises subjective interventions. We demonstrate the methodology by classifying data warehouse architectures

    A Comparative Study of Mouse Hepatic and Intestinal Gene Expression Profiles under PPARα Knockout by Gene Set Enrichment Analysis

    Get PDF
    Gene expression profiling of PPARα has been used in several studies, but fewer studies went further to identify the tissue-specific pathways or genes involved in PPARα activation in genome-wide. Here, we employed and applied gene set enrichment analysis to two microarray datasets both PPARα related respectively in mouse liver and intestine. We suggested that the regulatory mechanism of PPARα activation by WY14643 in mouse small intestine is more complicated than in liver due to more involved pathways. Several pathways were cancer-related such as pancreatic cancer and small cell lung cancer, which indicated that PPARα may have an important role in prevention of cancer development. 12 PPARα dependent pathways and 4 PPARα independent pathways were identified highly common in both liver and intestine of mice. Most of them were metabolism related, such as fatty acid metabolism, tryptophan metabolism, pyruvate metabolism with regard to PPARα regulation but gluconeogenesis and propanoate metabolism independent of PPARα regulation. Keratan sulfate biosynthesis, the pathway of regulation of actin cytoskeleton, the pathways associated with prostate cancer and small cell lung cancer were not identified as hepatic PPARα independent but as WY14643 dependent ones in intestinal study. We also provided some novel hepatic tissue-specific marker genes

    Grey theory based BP-NN co-training for dense sequence long-term tendency prediction

    Get PDF
    The file attached to this record is the author's final peer reviewed version.Purpose - The purpose of this paper is to solve the problems existing in topic popularity prediction in online social networks and advance a fine-grained and long-term prediction model for lack of sufficient data. Design/methodology/approach - Based on GM(1,1) and neural networks, a cotraining model for topic tendency prediction is proposed in this paper. The interpolation based on GM(1,1) is employed to generate fine-grained prediction values of topic popularity time series and two neural network models are considered to achieve convergence by transmitting training parameters via their loss functions. Findings - The experiment results indicate that the integrated model can effectively predict dense sequence with higher performance than other algorithms, such as NN and RBF_LSSVM. Furthermore, the Markov chain state transition probability matrix model is used to improve the prediction results. Practical implications - Fine-grained and long-term topic popularity prediction, further improvement could be made by predicting any interpolation in the time interval of popularity data points. Originality/value - The paper succeeds in constructing a co-training model with GM(1,1) and neural networks. Markov chain state transition probability matrix is deployed for further improvement of popularity tendency prediction

    ReCGiP, a database of reproduction candidate genes in pigs based on bibliomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reproduction in pigs is one of the most economically important traits. To improve the reproductive performances, numerous studies have focused on the identification of candidate genes. However, it is hard for one to read all literatures thoroughly to get information. So we have developed a database providing candidate genes for reproductive researches in pig by mining and processing existing biological literatures in human and pigs, named as ReCGiP.</p> <p>Description</p> <p>Based on text-mining and comparative genomics, ReCGiP presents diverse information of reproduction-relevant genes in human and pig. The genes were sorted by the degree of relevance with the reproduction topics and were visualized in a gene's co-occurrence network where two genes were connected if they were co-cited in a PubMed abstract. The 'hub' genes which had more 'neighbors' were thought to be have more important functions and could be identified by the user in their web browser. In addition, ReCGiP provided integrated GO annotation, OMIM and biological pathway information collected from the Internet. Both pig and human gene information can be found in the database, which is now available.</p> <p>Conclusions</p> <p>ReCGiP is a unique database providing information on reproduction related genes for pig. It can be used in the area of the molecular genetics, the genetic linkage map, and the breeding of the pig and other livestock. Moreover, it can be used as a reference for human reproduction research.</p

    The protection of glycyrrhetinic acid (GA) towards acetaminophen (APAP)-induced toxicity partially through fatty acids metabolic pathway

    Get PDF
    Background: Acetaminophen (APAP)-induced liver toxicity remains the key factor limiting the clinical application of APAP, and herbs are the important sources for isolation of compounds preventing APAP-induced toxicity. Aims: To investigate the protection mechanism of glycyrrhetinic acid towards APAP-induced liver damage using metabolomics method. Methods: APAP-induced liver toxicity model was made through intraperitoneal injection (i.p.) of APAP (400 mg/kg). Glycyrrhetinic acid was dissolved in corn oil, and intraperitoneal injection (i.p.) of glycyrrhetinic acid (500 mg/kg body weight) was performed for 20 days before the injection of APAP. UPLC-ESI-QTOF MS was employed to analyze the metabolomic profile of serum samples. Results: The pre-treatment of glycyrrhetinic acid significantly protected APAP-induced toxicity, indicated by the histology of liver, the activity of ALT and AST. Metabolomics showed that the level of palmtioylcarnitine and oleoylcarnitine significantly increased in serum of APAP-treated mice, and the pre-treatment with GA can prevent this elevation of these two fatty acid-carnitines. Conclusion: Reversing the metabolism pathway of fatty acid is an important mechanism for the protection of glycyrrhetinic acid towards acetaminophen-induced liver toxicity

    Twinning-assisted dynamic adjustment of grain boundary mobility

    Get PDF
    Grain boundary (GB) plasticity dominates the mechanical behaviours of nanocrystalline materials. Under mechanical loading, GB configuration and its local deformation geometry change dynamically with the deformation; the dynamic variation of GB deformability, however, remains largely elusive, especially regarding its relation with the frequently-observed GB-associated deformation twins in nanocrystalline materials. Attention here is focused on the GB dynamics in metallic nanocrystals, by means of well-designed in situ nanomechanical testing integrated with molecular dynamics simulations. GBs with low mobility are found to dynamically adjust their configurations and local deformation geometries via crystallographic twinning, which instantly changes the GB dynamics and enhances the GB mobility. This selfadjust twin-assisted GB dynamics is found common in a wide range of face-centred cubic nanocrystalline metals under different deformation conditions. These findings enrich our understanding of GB-mediated plasticity, especially the dynamic behaviour of GBs, and bear practical implication for developing high performance nanocrystalline materials through interface engineering

    Assessment of Autozygosity Derived From Runs of Homozygosity in Jinhua Pigs Disclosed by Sequencing Data

    Get PDF
    Jinhua pig, a well-known Chinese indigenous breed, has evolved as a pig breed with excellent meat quality, greater disease resistance, and higher prolificacy. The reduction in the number of Jinhua pigs over the past years has raised concerns about inbreeding. Runs of homozygosity (ROH) along the genome have been applied to quantify individual autozygosity to improve the understanding of inbreeding depression and identify genes associated with traits of interest. Here, we investigated the occurrence and distribution of ROH using next-generation sequencing data to characterize autozygosity in 202 Jinhua pigs, as well as to identify the genomic regions with high ROH frequencies within individuals. The average inbreeding coefficient, based on ROH longer than 1 Mb, was 0.168 ± 0.052. In total, 18,690 ROH were identified in all individuals, among which shorter segments (1–5 Mb) predominated. Individual ROH autosome coverage ranged from 5.32 to 29.14% in the Jinhua population. On average, approximately 16.8% of the whole genome was covered by ROH segments, with the lowest coverage on SSC11 and the highest coverage on SSC17. A total of 824 SNPs (about 0.5%) and 11 ROH island regions were identified (occurring in over 45% of the samples). Genes associated with reproduction (HOXA3, HOXA7, HOXA10, and HOXA11), meat quality (MYOD1, LPIN3, and CTNNBL1), appetite (NUCB2) and disease resistance traits (MUC4, MUC13, MUC20, LMLN, ITGB5, HEG1, SLC12A8, and MYLK) were identified in ROH islands. Moreover, several quantitative trait loci for ham weight and ham fat thickness were detected. Genes in ROH islands suggested, at least partially, a selection for economic traits and environmental adaptation, and should be subject of future investigation. These findings contribute to the understanding of the effects of environmental and artificial selection in shaping the distribution of functional variants in the pig genome
    corecore